AITopics | valid generalization

What Size Net Gives Valid Generalization?

Neural Information Processing SystemsFeb-17-2024, 23:57:48 GMT

We address the question of when a network can be expected to generalize from m random training examples chosen from some ar(cid:173) bitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. We show that if m O( log) random exam(cid:173) ples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 - t of the examples are correctly classified, then one has confi(cid:173) dence approaching certainty that the network will correctly classify a fraction 1 - of future test examples drawn from the same dis(cid:173) tribution. Conversely, for fully-connected feedforward nets with one hidden layer, any learning algorithm using fewer than O( '!') random training examples will, for some distributions of examples consistent with an appropriate weight choice, fail at least some fixed fraction of the time to find a weight choice that will correctly classify more than a 1 - fraction of the future test examples.

future test example, random training example, valid generalization, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

For Valid Generalization the Size of the Weights is More Important than the Size of the Network

Neural Information Processing SystemsApr-6-2023, 18:17:30 GMT

This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. More specifi(cid:173) cally, consider an i-layer feed-forward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A. The misclassification probability con(cid:173) verges to an error estimate (that is closely related to squared error on the training set) at rate O((cA)l(l 1)/2J(log n)jm) ignoring log factors, where m is the number of training patterns, n is the input dimension, and c is a constant. This may explain the gen(cid:173) eralization performance of neural networks, particularly when the number of training examples is considerably smaller than the num(cid:173) ber of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training.

neural network, training pattern, valid generalization, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.63)

Add feedback

For Valid Generalization the Size of the Weights is More Important than the Size of the Network

Bartlett, Peter L.

Neural Information Processing SystemsDec-31-1997

Baum and Haussler [4] used these results to give sample size bounds for multi-layer threshold networks Generalization and the Size of the Weights in Neural Networks 135 that grow at least as quickly as the number of weights (see also [7]). However, for pattern classification applications the VC-bounds seem loose; neural networks often perform successfully with training sets that are considerably smaller than the number of weights. This paper shows that for classification problems on which neural networks perform well, if the weights are not too big, the size of the weights determines the generalization performance. In contrast with the function classes and algorithms considered in the VC-theory, neural networks used for binary classification problems have real-valued outputs, and learning algorithms typically attempt to minimize the squared error of the network output over a training set. As well as encouraging the correct classification, this tends to push the output away from zero and towards the target values of { -1, I}.

dimension, fat-shattering dimension, misclassification probability, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

For Valid Generalization the Size of the Weights is More Important than the Size of the Network

Bartlett, Peter L.

Neural Information Processing SystemsDec-31-1997

Baum and Haussler [4] used these results to give sample size bounds for multi-layer threshold networks Generalization and the Size of the Weights in Neural Networks 135 that grow at least as quickly as the number of weights (see also [7]). However, for pattern classification applications the VC-bounds seem loose; neural networks often perform successfully with training sets that are considerably smaller than the number of weights. This paper shows that for classification problems on which neural networks perform well, if the weights are not too big, the size of the weights determines the generalization performance. In contrast with the function classes and algorithms considered in the VC-theory, neural networks used for binary classification problems have real-valued outputs, and learning algorithms typically attempt to minimize the squared error of the network output over a training set. As well as encouraging the correct classification, this tends to push the output away from zero and towards the target values of { -1, I}.

dimension, fat-shattering dimension, misclassification probability, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

For Valid Generalization the Size of the Weights is More Important than the Size of the Network

Bartlett, Peter L.

Neural Information Processing SystemsDec-31-1997

Baum and Haussler [4] used these results to give sample size bounds for multi-layer threshold networks Generalization and the Size ofthe Weights in Neural Networks 135 that grow at least as quickly as the number of weights (see also [7]). However, for pattern classification applications the VC-bounds seem loose; neural networks often perform successfully with training sets that are considerably smaller than the number of weights. This paper shows that for classification problems on which neural networksperform well, if the weights are not too big, the size of the weights determines the generalization performance. In contrast with the function classes and algorithms considered in the VC-theory, neural networks used for binary classification problems have real-valued outputs, and learning algorithms typically attempt to minimize the squared error of the network output over a training set. As well as encouraging the correct classification, this tends to push the output away from zero and towards the target values of { -1, I}.

dimension, fat-shattering dimension, misclassification probability, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)

Add feedback

What Size Net Gives Valid Generalization?

Baum, Eric B., Haussler, David

Neural Information Processing SystemsDec-31-1989

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size.

architecture, probability, training example, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.72)

Add feedback

What Size Net Gives Valid Generalization?

Baum, Eric B., Haussler, David

Neural Information Processing SystemsDec-31-1989

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size.

architecture, probability, training example, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.72)

Add feedback

What Size Net Gives Valid Generalization?

Baum, Eric B., Haussler, David

Neural Information Processing SystemsDec-31-1989

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probabilitydistribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size.

artificial intelligence, inductive learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.73)

Add feedback

What size net gives valid generalization?

Baum, E., Haussler, D.

ClassicsFeb-1-1989

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. We show that if m O(W/ log N/) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 /2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 1 of future test examples drawn from the same distribution. Conversely, for fully-connected feedforward nets with one hidden layer, any learning algorithm using fewer than Ω(W/) random training examples will, for some distributions of examples consistent with an appropriate weight choice, fail at least some fixed fraction of the time to find a weight choice that will correctly classify more than a 1 fraction of the future test examples.

artificial intelligence, machine learning, valid generalization, (5 more...)

Classics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Filters

Collaborating Authors

valid generalization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

What Size Net Gives Valid Generalization?

For Valid Generalization the Size of the Weights is More Important than the Size of the Network

For Valid Generalization the Size of the Weights is More Important than the Size of the Network

For Valid Generalization the Size of the Weights is More Important than the Size of the Network

For Valid Generalization the Size of the Weights is More Important than the Size of the Network

What Size Net Gives Valid Generalization?

What Size Net Gives Valid Generalization?

What Size Net Gives Valid Generalization?

What size net gives valid generalization?